100
Binary Neural Architecture Search
Sample without
Replacement & Train
Compute
P
C
C
EP(A - A )+ A
Child / Parent
Reduce
search space
K Times
Minimum
A
B
C
D
E
F
A
B
C
D
E
F
3
0
1
2
3
0
0
1
2
3
1
2
3
0
1
2
FIGURE 4.4
The main framework of the proposed Child-Parent search strategy. In a loop, we first sample
the operation without replacement for each edge of the search space and then train the child
and parent models generated by the same architecture simultaneously. Second, we use the
Eqs. 4.15 and 4.28 to compute the evaluation indicator calculated by the accuracy of both
models on the validation data set. Until all operations are selected, we remove the operation
on each edge with the worst performance.
loss between child and parent networks. It is observed that the worst operations in the early
stage usually have worse performance in the end. On the basis of this observation, we then
remove the operation with the worst performance according to the performance indicator.
This process is repeated until only one operation is left on each edge. We reformulate the
traditional loss function as a kernel-level Child-Parent loss for binarized optimization of
child-parent model.
4.3.1
Child-Parent Model for Network Binarization
Network binarization calculates neural networks with 1-bit weights and activations to fit
the full-precision network and can significantly compress deep convolutional neural networks
(CNNs). Previous work [287] usually investigates the binarization problem by exploring the
full-precision model to guide the optimization of binarized models. Based on the investi-
gation, we reformulate NAS-based network binarization as a Child-Parent model as shown
in Fig. 4.5. The child and parent models are the binarized model and the full-precision
counterpart, respectively.
Conventional NAS is inefficient due to the complicated reward computation in network
training, where the evaluation of a structure is usually done after the network training
converges. There are also some methods to perform the evaluation of a cell during network
training. [292] points out that the best choice in the early stages is not necessarily the final
optimal one; however, the worst operation in the early stages usually performs poorly in the
end. And this phenomenon will become more and more significant as training progresses. On
the basis of this observation, we propose a simple yet effective operation-removing process,
which is the key task of the proposed CP-model.
Intuitively, the difference between the ability of children and parents and how much
children can independently handle their problems are two main aspects that should be
considered to define a reasonable performance evaluation measure. Our Child-Parent model
introduces a similar performance indicator to improve search efficiency. The performance
indicator includes two parts, the performance loss between the binarized network (child) and
the full-precision network (parent), and the performance of the binarized network (child).